Skip to content

Conversation

@denisilie94
Copy link
Contributor

@denisilie94 denisilie94 commented Dec 4, 2025

Summary

Standardizes task vector extraction in SCE to use the same logic in get_task_vectors function from generalized_task_arithmetic.py, adding submatrix support that was previously missing.

Problem

SCE currently uses an independent implementation for extracting task vectors
This independent implementation lacks submatrix support
The get_task_vectors function in generalized_task_arithmetic.py already provides this functionality
Code duplication creates maintenance overhead and feature inconsistency, at some point SCE should benefit from the generalized_task_arithmetic.py implementations.

Solution

Added submatrix support for SCE such as in the existing get_task_vectors function
Ensures consistent logic across both implementations
Adds submatrix support to SCE operations


Note

Adds submatrix handling and dtype normalization to sce_merge, stacking only valid task vectors and warning when tensors are sliced.

  • Merge method sce (mergekit/merge_methods/sce.py):
    • Adds submatrix support by slicing tensors to match base_tensor when shapes differ, with a warning per tensor.
    • Normalizes tensor dtypes to base_tensor.dtype before computing task vectors.
    • Builds task vectors from validated tensors and stacks them; returns base tensor if none are valid.
    • Preserves existing masking (sce_mask), sign-consensus erasing, weighting, and merge logic.

Written by Cursor Bugbot for commit b318151. This will update automatically on new commits. Configure here.

# Handle shape mismatch - resize to base dimensions
if t.shape != base_tensor.shape:
# Slice tensor to match base_tensor dimensions
t = t[: base_tensor.shape[0], : base_tensor.shape[1]]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Submatrix slicing crashes on 1D tensors

The submatrix slicing t[: base_tensor.shape[0], : base_tensor.shape[1]] assumes tensors are 2D. If a 1D tensor (like a bias vector or layer norm weight) has a shape mismatch, accessing base_tensor.shape[1] will raise an IndexError. The reference implementation in generalized_task_arithmetic.py avoids this by only applying 2D slicing to is_embed weights, which are guaranteed to be 2D embedding matrices. The SCE implementation lacks this guard and applies 2D indexing unconditionally to any shape-mismatched tensor.

Fix in Cursor Fix in Web

logging.warning(f"Using submatrix of tensor {idx}")

# Compute task vector (delta)
task_vector = t - base_tensor
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Slicing fails when tensor is smaller than base

The submatrix slicing t[: base_tensor.shape[0], : base_tensor.shape[1]] only works when t is larger than base_tensor. If t is smaller in any dimension (e.g., a model with smaller vocabulary), the slice operation returns t unchanged, and the subsequent subtraction t - base_tensor on line 45 will raise a broadcasting error due to shape mismatch. The reference implementation in generalized_task_arithmetic.py handles this by skipping non-embedding tensors with mismatches entirely, but the SCE implementation unconditionally attempts to proceed.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant